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ABSTRACT 


We  are  told  that  an  object  is  hidden  in  one  of  m(m  <  ®) 
boxes  and  we  are  given  prior  probabilities  p?  that  the 
object  is  in  the  i1^1  box.  A  search  of  box  i  costs  Cj  and 
finds  the  object  with  probability  if  the  object  is  in 

the  box.  Also,  we  suppose  that  a  reward  R.  is  earned  if 
the  object  is  found  in  the  i**1  box.  A  strategy  is  any  rule 
for  determining  when  to  search  and  if  so  which  box.  The 
major  result  is  that  an  optimal  strategy  either  searches  a 
box  with  maximal  value  of  ajp./c.  or  else  it  never  searches 
those  boxes.  Also,  if  rewards  are  equal,  then  an  optimal 
strategy  either  searches  a  box  with  maximal  OjP./Cj  or  else 
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Introduction  and  Summary 

The  following  model  has  been  considered  in  the  literature:  We  are  told 
that  an  object  is  hidden  in  one  of  m  boxes  and  we  are  given  prior  prob¬ 
abilities  p?  N,  2 .  hi  (Ip?  ■  1)  that  the  object  is  in  the  Ith 

box.  A  search  of  box  I  costs  c.  (c.  >  0),  and  finds  the  object  with 
probability  a.  if  the  object  is  in  the  box  (i.e.  I  -  a.  is  the  over¬ 
look  probability  for  the  ith  box).  At  the  beginning  of  each  time 
period  t  •=  I,  2,  ...  a  box  Is  searched;  and  the  process  ends  when  the 
object  is  found. 

Blackwell  (see  [5])  has  shown  that  the  strategy  which  at  time  t  searches 
a  box  with  the  largest  present  value  of  (XjP./c.  minimizes  the  expected 
searching  cost;  (where  pj  is  the  posterior  probability  at  time  t  that 
the  object  is  in  box  i).  Chew  [3]  and  Kadane  [i»]  have  shown  that  if 
Cj  =  I  then  this  strategy  also  maximizes  the  probability  that  the 
searching  cost  will  be  less  than  A  for  every  A  >  0. 

In  this  paper  in  order  to  motivate  the  search  we  suppose  that  a  reward 
R.  i-1,  ...,  m  is  earned  If  the  object  is  found  in  the  ith  box.  We 
also  suppose  that  the  searcher  may  decide  to  stop  searching  at  any  time 
(for  example  he  may  feel  that  the  rewards  are  not  large  enough  to  justify 


V 
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the  searching  costs).  If  the  searcher  decides  to  stop  before  finding  the 
object  then  from  that  point  on  he  Incurs  no  further  costs  and  of  course 
receives  no  reward. 


In  the  second  section  of  this  paper  we  show  that  an  optimal  strategy 
exists  and  Is  defined  by  a  functional  equation.  The  optimal  strategy 
is  exhibited  in  a  special  case.  The  third  section  deals  with  the  op¬ 
timal  n-stage  return  function.  The  fourth  section  presents  some 
counterexamples,  and  in  the  fifth  section  we  present  the  major  results. 
Speaking  loosely  we  show  that  the  optimal  strategy  either  searches  the 
box  with  maximal  value  of  a.p./C|  or  else  it  never  searches  that  box. 
Also,  if  rewards  are  equal,  Rj  =  R,  then  the  optimal  strategy  either 
searches  the  box  with  maximal  ci|P|/C|  or  else  it  stops.  In  the  final 
section  we  assume  that  R.  =  R  and  present  a  sequence  of  strategies 
converging  to  the  optimal. 
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2 .  Optimal  Strategy 

A  strategy  is  any  sequence  (or  partial  sequence)  6  ■  (6j . 6s)  where 

6j  e  {I,  2,  m}  for  M . s  and  s  e  {0,  1,  2,  ...«>}.  The  policy 

6  instructs  the  searcher  to  search  box  6j  at  the  i1^1  period  and  to  stop 
searching  if  the  object  hasn't  been  found  after  the  s^  search.  (s  *=  0 
means  that  the  searcher  stops  immediately  and  s  *=  “  means  that  he  doesn't 
stop  until  he  finds  the  object). 

For  ary  strategy  6  and  any  P  »  (p  j ,  . . .  ,  p^) ,  p  j  ^  0,  Ep .  -  1 ,  let  f  (P ,6) 

be  the  risk  (expected  searching  cost  minus  expected  reward)  incurred  when 

P  is  the  vector  of  prior  probabilities  and  strategy  <$  is  employed.  Also 

let  f(P)  =  inf  f  (P,<5) .  Then  it  follows  from  standard  arguments  (see  for 

6 

instance  [1]  P.  83)  that 

(1)  f (P)  -  min  1  0,  min  jc.  ”  °jPjR|  +  0-a|Pj)f(T.P)| 

(  i»l , . .  ,m  '  ’ 

where  T.P  *  ( (T j P ) ^  »  (T.P)m)  i*1 ,  2,  ...»  m,  and  where 

|PjO-«,P,)''  j  1 1 

(2)  (TjP) j  -  | 

(  0  -  a,)Pj(  1  -  Vi)'1  J  -  i 

Thus  (T|P)j  is  just  the  posterior  probability  that  the  object  is  in  box 
j  given  that  a  search  of  i  has  not  uncovered  it.  We  shall  say  that  the 
process  is  in  state  P  at  time  t  if  P  denotes  the  posterior  probability 


vector  at  time  t. 


In  order  to  show  the  existence  of  an  optimal  strategy  let  R  ■■  max  R.  and 

,1  1 

consider  a  related  process  (the  prime  process)  with  Cj  ■  Cj,  a.  “Oj,  but 
with  rJ  -  Rj  -  R.  However  for  this  new  process  we  suppose  that  a  penalty 
cost  of  R  units  is  imposed  If  the  searcher  decides  to  stop  searching  be¬ 
fore  finding  the  object.  Now  it  is  easy  to  see  that  for  any  strategy  6 
which  terminates  (either  by  finding  the  object  or  by  stopping)  in  finite 
expected  time  we  have  f(P,6)  ■  f  (P,6)  -  R,  and  since  these  are  the  only 
strategies  we  need  consider,  (any  strategy  which  doesn't  terminate  in 
finite  expected  time  has  f(P)  ■  f'(P)  ■  “)  it  follows  that  any  strategy 
optimal  for  the  prime  process  is  optimal  for  the  original  one.*  However, 
the  prime  process  is  a  dynamic  programming  process  with  a  finite  number 
of  possible  actions  available  at  each  stage  and  with  non-positive  returns 
at  each  stage  (since  Rj  <  0  V  i).  It  then  follows  from  Strauch  [6]  that 
an  optimal  strategy  exists  and  also  that  the  optimal  strategies  may  be 
characterized  as  those  strategies  which  when  the  process  is  in  state  P 
chooses  one  of  the  actions  which  minimize  the  right  side  of  (I),  i.e.  for 
such  a  6*,  f(P,  6*)  ■  f(P)  for  all  P. 

The  Importance  of  rigorously  proving  that  an  optimal  policy  exists  and  is 
determined  by  a  functional  equation  cannot  be  overemphasized.  For  example 
in  the  above  suppose  we  relax  the  condition  that  Cj  >  0  and  let  Cj  ■  0. 

Then  if  OjPj  >  0  it  is  clear  that  for  any  strategy  5  ■  (6j,...,6s)  + 

(1,  I,  I,  ...),  f(P,  (I,  <5, . <5j) )  <  f(P,  (6, . 6$))  (since  a 

search  of  I  is  free)  and  thus  the  only  possible  optimal  strategy  would  be 

*The  above  argument  also  shows  that  there  is  no  additional  generality 
gained  in  assuming  that  a  penalty  cost  c  is  incurred  when  the  searcher  stops 
without  finding  the  object,  as  this  process  would  just  be  equivalent  to  the 
original  one  with  rewards  R.  +  c  instead  of  Rj. 
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6j  «=  (I,  I,  1,  ...)•  However  f  (P ,  6  j )  *=  p  j  R  j  and  it  is  clear  that  this 
need  not  be  maximal.  For  example  if  C|  »  0,  oi|  -  t/2,  =  1/10,  Rj  »  10 

and  c2  «  1 ,  a2  ■  1,  p2  «  9/10,  R2  -  10  then  f(P,  6 j )  »  I  while 

f(P,  (I.  1,  ....  I,  2,  1,  I,  1,  ...))  -  -,^IO(l-(l/2)n)  +  9(l/2)nj  +  ^-9  t  |i 

Also  the  strategy  determined  by  the  functional  equation  turns  out  to  be  the 
(non-opt im3l )  strategy  6j.  (The  reason  that  the  existence  proof  given  above 
breaks  down  is  that  since  Cj  -  0  it  no  longer  follows  that  all  strategies  5 
with  infinite  expected  termination  time  have  f(P,6)  **  ®) . 


Now  consider  the  class  A  of  strategies  6  *  (<5j,  ....  6s)  for  which  s  *  ®. 
Any  policy  6  e  A  which  finds  the  object  with  probability  1  will  have 
f(P,6)  -  E^L  -  Z  PjRj  where  L  is  the  searching  cost  incurred;  any  4  t  A 
which  has  positive  probability  of  never  finding  the  object  has  f(P,6)  ■  ®. 
Thus  among  the  class  of  policies  which  never  stop  searching  until  the  object 
is  found  the  one  with  minimal  expected  searching  cost  is  best.  Thus  by 
Blackwell's  result  the  strategy  5^  which  when  in  state  P  searches  the  box 
(or.  one  of  the  boxes)  with  the  maximal  value  of  ajP./Cj  Is  optimal  among 
the  pol icies  in  A, 


Lemma  2.1:  If  ajPjRj  >  Cj  for  some  I  then  no  optimal  strategy  stops 
searching  at  P  -  (pj,  ...,  pm).  If  a . p ^ R ^  Cj  for  some  i  then  there  is 
an  optimal  strategy  which  doesn't  stop  at  P. 
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Proof :  From  (1)  we  have  that 

f(P)  <_  c,  -  otjp.R.  +  (I  -  a j p j ) f (T j P ) 

<  0  +  (1  -  ct, P|  )f  CT, P) 

<  0 

and  so  f(P)  <  0  and  thus  no  optimal  policy  stops  at  P.  If  a . p . R .  >.  Cj 
then  fj(P)  =  Cj  ajPjRj  +  0  "  a  j  p  j )  f  (T  j  P)  <_  0.  Now  if  f  (P )  ■  0  then 
f(P)  «  fj(P)  and  so  searching  i  is  optimal;  if  f(P)  <  0  then  stopping 
is  not  opt imal .  Q.E.D. 

m 

Theorem  2.2:  If  E  c./a.Rj  <_  1  then  6^  is  optimal,  l.e.  f (F , 5^)  -  f(P) 
i«*l 

for  all  P. 

Proof:  For  any  P,  if  maxfajPjRj  -  Cj)  >_  then  there  exists  an  optimal 
strategy  which  doesn't  stop  at  P.  So  a  necessary  condition  for  every 
optimal  strategy  to  stop  at  P  is  for 

oi j p j R I  <  Cj  for  al  1  i 

•>  p |  <  c j /a | R j  for  al 1  i 

->  I  <  Ecj/a.Rj 

So  if  Ec  j  /a  j  R  j  <_  I  then  for  every  P  there  is  an  optimal  strategy  which 
doesn't  stop  at  P.  Thus  an  optimal  strategy  exists  in  A  which  implies 
that  6^  is  optimal . 


Q.E.D. 
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3.  The  Optimal  Return  f(P) 

Theorem  3 • 1 :  v(P)  is  a  concave  function  of  P. 

Proof :  Let  fj(6)  be  the  conditional  risk  given  that  the  object  is  in  i 

and  strategy  6  is  employed,  i-1  ,  ....  m.  Then  f(P,6)  *  £p.f.(6).  Now 

i  1  1 

let  P  «=  XP1  +  (1  -  A)P2,  then 

f(P)  =  inf  f  (P, 6) 

6 

-  inf  f (XP1  +  (1  -  A)P2,  5) 

6 

-  inf  E  (AP1  +  (1  -  A)P2) . f . (6) 

>  A  Inf  E  pj f . (6)  +  (1  -  A)  inf  E  P2fj(6) 

-  A  f(p')  +  (1  -  A) f (P2 ) 

Q.E.D. 

Coroi lary  3-2:  The  optimal  stop  region  S  =  {P  :  f(P)  ■  0}  is  convex. 

Proof:  Suppose  P  *  AP*  +  (1  -  A ) P2  and  f(P*)  *  f  (P2 )  -  0.  Then 
f(P)  <_  0  by  (1)  and  f(P)  ^  0  by  the  above. 

Q.E.D. 


Let 


(3) 


f,(P) 

f„(p> 


min 


min  c. 

i  I  ' 


min  !  0,  min  Jc , 

(  i  I  1 


ViRiij 

aiPiRi + 


n  >  1 


Thus  f  (P)  is  just  the  minimal  risk  incurred  if  the  searcher  is  allowed  at 
n 

most  n  searches.  Clearly  f  (P )  f  (P)  >_  f(P)  for  all  n,  all  P,  and  it 
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seems  reasonable  that  f  (P)  +  f(P)  as  n  t  ®,  This  is  shown  in  the 

n 

fol lowing. 

Letting  c  ■  min  c.  ,  0  ■  max  (R.  -  c.) 

i  '  I  1  ' 


Theorem  3-3:  fp(P)  -  f (P)  £  ~  all  n,  all  P. 

* 

Proof :  Let  6  be  an  optimal  strategy,  let  T  be  the  random  number  of  times 

*  *  it 

6  searches  before  terminating,  and  let  on  be  6  terminated  at  n,  i.e. 

6  ■  (6.  ...  5  ) .  Then 

n  I  s*n 


(k)  f(P)  -  f (P , 6* )  -  £  *[X  |  T  <  nJP  [T  <  n]  ♦  E  *[X  |  T  >  n]Pr[T  >  n] 

5  -  r  -  6  r 


(5)  fn(P)  £  f  (P,6n)  “  E  *[X  |  T  £  n]Pr[T  £  n]  +  E  A[X  |  T  >  n]Pf[T  >  n] 

6  6 

n 

where  X  denotes  the  total  cost  incurred  (and  everything  is  understood  to  be 
conditional  on  the  prior  probability  vector  P).  Thus 


(6)  f(P)  -  f(P)  <  E  *[X  |  T  >  n]  -  E  *[X  |  T  >  n]  P[T  >  n] 
n  ”6  6  r 
L  n  J 

£  0  P r  [T  >  n] 


To  get  a  found  on  Pr(T  >  n]  we  use  (4)  to  get 


(7)  >  f(P)  >  -0  P r (T  <  n]  +  (-0  +  nc)Pr[T  >  n] 


■  -0  +  nc  Pr(T  >  n] 


(8)  P r  [T  >  n]  £  D/nc 


The  result  follows  from  (6)  and  (8) 


Q.E.O. 
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Corol  lary  3-  **:  If  cx .  R .  <  c.  for  all  1  =  1,  2 . m  then  f  (P)  r  0,  i.e. 

the  policy  which  never  searches  is  optimal. 

Proof:  It  follows  from  (3)  that  f  j  (P)  =  0,  and  by  induction  that 

f n (P)  =  0  for  all  n,  and  thus  by  the  above  f(P)  =  0.  Q.E.D. 

The  above  Corollary  may  also  be  proven  directly  by  letting  e*  be  the 
m-vector  of  all  zeroes  except  for  a  one  in  the  ith  spot.  If  ct|R.  <  Cj 
for  all  i  then  by  (I)  it  follows  that  f  (e  * )  ■  0,  1*1 ,  ...,  m;  and  thus 
by  concavity  f(P)  =  0. 
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4 .  Counter-Examples 

Consider  the  following  three  conjectures: 

1.  If  Cj  >  Rj  then  an  optimal  strategy  will  never  search  box  1. 

2.  If  an  optimal  strategy  doesn't  stop  at  P  then  It  searches  a  box 
with  maximal  a.p./Cj . 

3-  If  m  is  the  number  of  boxes  then  an  m-stage  look  ahead  strategy  is 
optimal;  where  an  m-stage  look  ahead  strategy  is  defined  as  any 
strategy  which  stops  at  P  if  fm(P)  “  0,  and  searches  the  i**1  box 
at  P  if  fm(P)  -  c,  -  a|P.R|  *  (I  -  a(P|>  ,  (T,P). 

We  shall  now  give  examples  showing  that  each  of  these  conjectures  need 

not  hold. 


Example  1: 


P,  - 


c,  - 


R,  - 


1 

3/4 

5 

0 


P„  - 


1 

1/4 

10 

210 


If  the  searcher  first  searches  2  and  then  acts  optimally  his  risk  is 
10  -  •£  210  ■  -170/4;  while  If  he  first  searches  1  and  then  acts  opti¬ 
mally  his  risk  is  5  "  ^  200  -  -45  <  -170/4.  Thus  the  optimal  strategy 
starts  by  searching  l . 


Example  2: 


a,  - 


R,  - 


1 

3/4 

10 

0 


a.  - 


P„  - 


l 

1/4 

10 

210 
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If  the  searcher  first  searches  I  then  his  minimal  risk  is  10  ■  200  =  -^0 

while  if  he  first  searches  2  his  minimal  risk  is  10  -  ^  210  <  -kO.  Thus 
the  optimal  strategy  starts  by  searching  2.  However  Pj  /c^  *=  ^  ^  c 

^2^*2^  c2 ' 


Exan 

iple  3: 

“l 

-  1 

a2  -  .65 

P. 

-  A 

P2-  .6 

Cl 

•=  50 

C2  "  50 

R. 

-  100 

O 

o 

■ 

CM 

cd 

It  can  be  checked  directly  that  .6)  *  0  and  so  the  two-stage  look 

ahead  strategy  stops.  However 

f3(.4,  .6)  -  M- 50)  +  . 6 [  1 00  -  (.65)100  +  .35(50  -  100 (. 65) )]  <  0 
and  so  the  two-stage  look  ahead  strategy  is  not  optimal. 

Thus  none  of  the  conjectures  need  be  true.  We  will  later  show,  however, 
that  in  a  special  case  (R.  =  R)  conjectures  I  and  2  are  in  fact  true. 
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5 .  Main  Theorems 

For  any  strategy  6  let  (i,  j,  5)  be  the  strategy  which  first  searches  1 
then  j  and  then  follows  strategy  6. 


We  shall  need  the  following 


Lemma  5.1:  For  any  strategy  6  such  that  f(P,6)  <  «° 


f(P,(i,J,<5))  >  f  (P,(j,i  ,6)) 


iff  a.p./c.  <  a.p./c, 

•K'  '  yj  } 


Proof: 


f (P.d.J.fi))  ■  Cj  -  a.p.R.  +  (1-a.p.)  |^Cj  -  R 

f  (P,(j ,<5))  -  0^  -  apjRj  +  0-a.Pj)  j^c.  -  R 
now  since  TjT.P  -  T.TjP  it  follows  that 
f(P(i,j,<5))  -  f(P,(j,i,6))  -  “jPjC.  -  a.p.Cj 


a.p.  /l 

«.  Vi -(• 

i  I  -a .  p .  \ 

J  J  x 


a.p.  \ 

-[-LJ — )  f  (T.T.P.5) 
'-“i »i/  J ' 

a.p.  \ 

—  f  (T.T.P.6) 

1  -a . p .  /  i  j  ’ 

jV 


Q.E.O. 


Notat  ion:  For  any  policy  6  =»  (6  j . 6s)  and  t  <_s,  let 

p  ,  T  T  T  F 

P6,t  T6t  T6t_,  \  r> 

Thus  B  is  just  the  posterior  probability  vector  given  that  6  is  employed 
o ,  t 

and  the  item  has  not  been  found  after  t  searches. 


i 
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Theorem  5.2:  If  a.p?/c.  =  max  a.p?/c.  then 
-  iki  i  j  JKJ  J 

0  £ 

(a)  If  a.p.R.  ^c.  then  there  is  an  optimal  strategy  <5  having  6|  =  i. 

(b)  If  there  does  not  exist  an  optimal  strategy  with  <5*  =  i  then  no 
optimal  strategy  ever  searches  i. 


Proof :  (a)  We  first  show  that  there  is  an  optimal  strategy  6"  havi 


ng 


6^  =  i  for  some  k  <  s.  For  suppose  that  no  optimal  strategy  ever  searched 


i;  then  for  any  optimal  strategy  6 


p .  for  all  t  and  so  by 


Lemma  2.1  the  optimal  strategy  need  not  stop.  But  then  6  is  optimal 


and  so  there  would  be  an  optimal  strategy  with  5 


Thus  there  is  an 


A  A 

optimal  strategy  5  which  searches  i.  Let  k  be  the  first  time  6  searches 
i .  If 


k  t  1  then  since 


it  fol lows  that 


#'(p!\4' 


/  c . 


0 

j*5' 

cpi 

0 

c  .p . 

J  J 

jVi 

j 

max 

a.fp' 

where  c.  <  c 
J  “ 


6*,k-2 


and  so  by  Lemma 
By  induction  we  see  that 


5.1  there  is  an  optimal  strategy  with  *  i 

A 

there  is  an  optimal  strategy  with  6j  ■  i. 

i 

(b)  We  have  shown  by  the  above  that  if  an  optimal  strategy  6 

A  * 

has  6^  -  i  for  some  k  then  there  is  an  optimal  strategy  with  6j  -  i. 

Q.E.D. 


Corol  lary  5.3:  If  ajp?/c.  >  ajPj/cj  for  j  +  i  then 

(a)  every  optimal  strategy  has  5]  *  i 
or 

(b)  no  optimal  strategy  every  searches  i. 


Proof :  Follows  in  the  same  manner  as  in  the  previous  Theorem. 


\k 


Note  that  if  the  state  of  the  process  at  time  t  is  P  then  from  that  point 
on  we  can  consider  the  process  as  starting  anew  with  prior  probability 
vector  P.  Thus  at  time  t  it  is  optimal  to  search  the  box  with  the 
largest  present  value  of  ap/c  or  else  that  box  is  never  searched  from 
that  point  on.  We  are  able  to  prove  a  stronger  result  in  the  special 
case  where  all  rewards  are  equal. 

Theorem  5.^:  Suppose  R.  =  R  for  all  i.  If  a.p?/c.  =  max  a.p?/c.  then 
- hh  ,  iki  i  j  J\J  J 

ei ther 

k 

(a)  there  is  an  optimal  strategy  with  6j  =  i 
or 

(b)  the  only  optimal  strategy  is  the  one  which  does  not  search,  i.e. 
s  «=  0. 


Proof :  Let  6  =  (6j,  ...,  6S)  be  an  optimal  strategy.  If  6  ever  searches 

I  then  we  can  show  by  successive  permutations  (as  in  Theorem  5-2)  that  there 

*  a 

is  an  optimal  strategy  wi  th  6|  ■  i .  If  5  never  searches  i  then  s  <  for 

if  6  didn't  stop  and  never  searched  i  then  it  would  have  infinite  risk  and 

* 

so  wouldn't  be  optimal.  Suppose  now  that  s  f  0  and  let  k  *  6  .  Since  k  wi  1 1 


c^  (or  else  it 


'^st  search  made  it  follows  that 
would  be  better  not  to  make  the  last  search).  But  since  6  never  searches 
i  it  follows  thatjp^  ].  (  )■,  and  thus 

V  6  ,s  r  >  \  6  ,  s  - 1  /  . 


.  A  tDi .  v°  A, A  . 

c .  c.  o  -  ~7~  — « —  _ 

P: 


1/R 
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But  then  by  Lemma  2.1  it  would  be  optimal  to  search  i  at  time  s  +  I,  and 
so  by  the  above  there  would  be  an  optimal  strategy  with  6j  =  i  . 

Q.E.D. 


In  a  similar  manner  we  may  prove  the  following 


Coro  1 1  ary  5-5:  If  R.  =  R  and  if  cijP?/c.  r1  max  a.p?/Cj, 
6  with  6j  =  i  is  not  optimal. 


then  any  strategy 


Proof:  Let  £  be  such  that  a^P^/c£  “  max  <*.Pj/Cj.  ^  ^  searches  j  at  some 

time  then  by  successively  permuting  and  using  Lemma  5.1  it  follows  that  we 
may  (strictly)  improve  upon  6.  If  <$  never  searches  j  then  by  the  same 
reasoning  as  used  in  the  above  Theorem  it  follows  that  6  can't  be  optimal. 

Q.E.D. 


Thus  when  all  rewards  are  equal  it  is  either  optimal  to  search  a  box  with 
the  maximal  value  of  ajp./c.  or  else  it  is  optimal  to  stop. 

In  [3]  Chew  considered  the  problem  where  there  is  no  reward  given  for 
finding  the  object  but  where  there  is  a  penalty  cost  C  incurred  if  the 
searcher  stops  without  finding  the  object.  He  also  supposed  that  *  0 
and  p^*  >  0.  (Thus  there  Is  positive  probability  that  the  object  is  in 
the  first  box  but  with  probability  one  a  search  would  overlook  it.) 

*Actuaily  Chew  supposed  that  Ep?  <  I.  However  this  is  clearly 
0  •  1 

equivalent  to  having  Ep.  =  I  and  having  a  box  with  an  overlook  probability 


of  one. 


r 
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lie  showed  that  if  c.  =  I  then  the  optimal  strategy  either  searches  the 
box  with  maximal  o.p./c.  or  else  stops.  However,  as  was  previously  point 
out,  this  problem  is  equivalent  to  the  one  we've  considered  with  R.  E  C. 
Thus  Theorem  may  be  considered  as  an  extension  of  Chew's  result  to 
non-constant  costs  and  to  general  overlook  probabilities. 
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6 .  Aj2p_rox i m a_t_i o ns  to  Opt  ima  I  St  ra  tegy 

In  this  section  we  suppose  that  R.  I  R,  and  exhibit  a  sequence  of 
strategies  which  converge  to  an  optimal  strategy. 


£  *  ;V 

Let  6  =  (6j  ,  ....  (5  )  be  an  optimal  strategy  which  either  when  in  state 

P  stops  if  f(P)  =  0  or  else  searches  a  box  with  maximal  value  of  a.p./c. 

A 

Let  T  be  the  random  number  of  stages  6  searches  before  terminating,  and 
recall  that  c  =  min  c..  We  shall  need  the  following: 


Lemma  6.1: 


Pr(T 


n)  < 


E  c./a. 
i  1  1 


for  all  n 


Proof: 

The  minimal  value  of  max  a.p./c.  is  achieved  by  that  vector  P  having 

(9)  V/c,  *  V2/c2  VA 

and  thus 

(10)  min  max  a.p./c.  *  ■= — - - 

P  i  ''  7  c./a. 

i  i  i 


Now  each  time  6  searches  a  box  with  maximal  value  of  a.p./c..  Thus  each 
time  5*  searches  a  box  (say  box  j)  the  probability  a.p.  the  item  will  be 


J  J 


found  is  such  that 
c . 


(,l)  Vj  - t/tt  i  rr757 

i 


i  ci/ai 


The  result  follows  immediately. 


Q.E.D. 
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Now  let  6  *=  (6j . 6  )  be  the  strategy  which  when  in  state  P  stops 

n 

if  f  (P)  *  0  or  else  searches  a  box  with  maximal  value  of  o.p./c.,  i.e. 

n  i  v  \  ri  i’ 


s  ■  min  { k 
n  ) 


:  f-(v.k) 


0J.  Since  fn(P)  4  f(P)  It  follows  that 


s  t  s  as  n  t  ®. 
n 


Recalling  that  0  «•  max  (R  -  c.)  ■  R  -  c  we  have 


Theorem  6.2:  f  (P,6n)  <  f (P)  +  D (I  -  c/lc . /a. )  n  for  al 1  P,  al  1  n. 


Proof:  f(P,6n)  -  f(P) 


•■'(v 

■  iv... )] 


<  D  Pr(T  >  n)  P r (T  >  sn) 


where  the  last  inequality  follows  from  (6).  The  result  then  follows  from 


Lemma  6.1. 


Q.E.D. 


In  order  to  effectively  apply  the  policies  6n,  n  1 ,  we  need  to  be  able 

to  characterize  the  continuation  sets  A  =  jp;  f  (P)  <  oj  .  These  sets 

n  (  n  ) 

can  be  constructed  as  follows: 


A1  -  iP:  3  i:  ci  -  ViR  "  °i 


A2  "  A|U  B2 


where 
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(13)  B2  =  j  P:  3  i , j :  c.  -  a.p.R  +  (l-a.p.)Jcj  -  a.  (Tjp)jR]  <  0 

-I  (  1  ie:j 

Noting  that  (T ; ^ ) j  =  (l~otj<5.j)pj  (l-a.p.)  where  <5.^  *=  j 


we  can  wr i te 


i  2 

=  P : 3  i , j  ■  c.  ■  oi.p.R  +  c.  -  a.p.R  -  a.p.c.  +  a.  5. . p . R  <  o 
2  I  i  r  i  j  jrj  iij  J'JJ 


Simi larly 


where 


V  A2UB3 


(15)  =  jP:3i,j,k:  c.  -  a.p.R  +  (l-a.p.)  Cj-a.  (TjP)^R 


0-aj(T.P)J)(ck-ak(Tj.TiP)kR)]  <  oj 
jp:3i,j.k:  c.  -  a,PjR  *  c.  -  a.p.R  *  ck  -  akPkR 

*  aipicj  '  (Vi  +  “jpj)ck  *  °^6  i jPj  (R  *  Ck’ 


*  ak  ‘V,‘SJk  +  Sik>  '  “k  Sik  5jk  fk"  *  0 


Similarly  the  other  A^s  ■  An_|  U  3n  may  be  obtained.  Also  we  may  let 
(16)  b[  -  A, 

b!  *  [p:3i/j:  c.  -  a.p.R  +  c.  -  a.p.R  -  a.p.c.  <  ol 
2  i  •  '  '  J  J  J  ii  J  j 

B^  *  Jp:3  i^jVk:  c.  -  a.p.R  +  c.  -  a^R  +  ck  -  a^R 

-  a.p.c.  -  (a.p.  +  a.p  )c.  <0> 
rij  r  i  jkj  k  ( 


Then  B  C  B  and  we  may  approximate  A  by  U  B..  We  also  note  that 

i  =  1 

bJ  -  Aj  and  B^  -  A^ 


r 
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